6,231 research outputs found
Recommended from our members
Rényi divergence variational inference
This paper introduces the (VR) that extends traditional variational inference to Rényi’s -divergences. This new family of variational methods unifies a number of existing approaches, and enables a smooth interpolation from the evidence lower-bound to the log (marginal) likelihood that is controlled by the value of that parametrises the divergence. The reparameterization trick, Monte Carlo approximation and stochastic optimisation methods are deployed to obtain a tractable and unified framework for optimisation. We further consider negative values and propose a novel variational inference method as a new special case in the proposed framework. Experiments on Bayesian neural networks and variational auto-encoders demonstrate the wide applicability of the VR bound.YL thanks the Schlumberger Foundation FFTF fellowship. RET thanks EPSRC grants # EP/M026957/1 and EP/L000776/1
Neural adaptive sequential Monte Carlo
Sequential Monte Carlo (SMC), or particle filtering, is a popular class of
methods for sampling from an intractable target distribution using a sequence
of simpler intermediate distributions. Like other importance sampling-based
methods, performance is critically dependent on the proposal distribution: a
bad proposal can lead to arbitrarily inaccurate estimates of the target
distribution. This paper presents a new method for automatically adapting the
proposal using an approximation of the Kullback-Leibler divergence between the
true posterior and the proposal distribution. The method is very flexible,
applicable to any parameterized proposal distribution and it supports online
and batch variants. We use the new framework to adapt powerful proposal
distributions with rich parameterizations based upon neural networks leading to
Neural Adaptive Sequential Monte Carlo (NASMC). Experiments indicate that NASMC
significantly improves inference in a non-linear state space model
outperforming adaptive proposal methods including the Extended Kalman and
Unscented Particle Filters. Experiments also indicate that improved inference
translates into improved parameter learning when NASMC is used as a subroutine
of Particle Marginal Metropolis Hastings. Finally we show that NASMC is able to
train a latent variable recurrent neural network (LV-RNN) achieving results
that compete with the state-of-the-art for polymorphic music modelling. NASMC
can be seen as bridging the gap between adaptive SMC methods and the recent
work in scalable, black-box variational inference
Streaming sparse Gaussian process approximations
Sparse pseudo-point approximations for Gaussian process (GP) models provide a
suite of methods that support deployment of GPs in the large data regime and
enable analytic intractabilities to be sidestepped. However, the field lacks a
principled method to handle streaming data in which both the posterior
distribution over function values and the hyperparameter estimates are updated
in an online fashion. The small number of existing approaches either use
suboptimal hand-crafted heuristics for hyperparameter learning, or suffer from
catastrophic forgetting or slow updating when new data arrive. This paper
develops a new principled framework for deploying Gaussian process
probabilistic models in the streaming setting, providing methods for learning
hyperparameters and optimising pseudo-input locations. The proposed framework
is assessed using synthetic and real-world datasets
The Multivariate Generalised von Mises distribution: Inference and applications
Circular variables arise in a multitude of data-modelling contexts ranging from robotics to the social sciences, but they have been largely overlooked by the machine learning community. This paper partially redresses this imbalance by extending some standard probabilistic modelling tools to the circular domain. First we introduce a new multivariate distribution over circular variables, called the multivariate Generalised von Mises (mGvM) distribution. This distribution can be constructed by restricting and renormalising a general multivariate Gaussian distribution to the unit hyper-torus. Previously proposed multivariate circular distributions are shown to be special cases of this construction. Second, we introduce a new probabilistic model for circular regression, that is inspired by Gaussian Processes, and a method for probabilistic principal component analysis with circular hidden variables. These models can leverage standard modelling tools (e.g. covariance functions and methods for automatic relevance determination). Third, we show that the posterior distribution in these models is a mGvM distribution which enables development of an efficient variational free-energy scheme for performing approximate inference and approximate maximum-likelihood learning.AKWN thanks CAPES grant BEX 9407-11-1. JF thanks the Danish Council for Independent Research grant 0602- 02909B. RET thanks EPSRC grants EP/L000776/1 and EP/M026957/1
Recommended from our members
Learning stationary time series using Gaussian processes with nonparametric kernels
Recommended from our members
A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation
Gaussian processes (GPs) are flexible distributions over functions that enable high-level assumptions about unknown functions to be encoded in a parsimonious, flexible and general way. Although elegant, the application of GPs is limited by computational and analytical intractabilities that arise when data are sufficiently numerous or when employing non-Gaussian models. Consequently, a wealth of GP approximation schemes have been developed over the last 15 years to address these key limitations. Many of these schemes employ a small set of pseudo data points to summarise the actual data. In this paper we develop a new pseudo-point approximation framework using Power Expectation Propagation (Power EP) that unifies a large number of these pseudo-point approximations. Unlike much of the previous venerable work in this area, the new framework is built on standard methods for approximate inference (variational free- energy, EP and Power EP methods) rather than employing approximations to the probabilistic generative model itself. In this way all of the approximation is performed at `inference time' rather than at `modelling time', resolving awkward philosophical and empirical questions that trouble previous approaches. Crucially, we demonstrate that the new framework includes new pseudo-point approximation methods that outperform current approaches on regression and classification tasks
Recommended from our members
Estimation of auditory filter shapes across frequencies using machine learning
When fitting a hearing aid, the level-dependent gain prescribed at each frequency is usually based on the hearing loss at that frequency. This often results in reasonable fittings for a typical cochlear hearing loss, but may fail when the individual frequency selectivity and/or loudness growth are different from what would be typical for that hearing loss. Individualised fitting based on measures of frequency selectivity might be useful in improving a fitting, for example by reducing across-channel masking. A popular measure of frequency selectivity is the notched-noise method, but this test is time-consuming. To reduce testing time, Shen and Richards (2013) proposed an efficient machine-learning test that determines the slope of the skirts of the auditory filter (p), its minimum response for wide notches (r), and detection efficiency (K). However, their test did not determine asymmetries in the auditory filter, which are important to consider during fitting to reduce across-channel masking.
The test proposed here provides a time-efficient way of estimating the auditory filter shape and asymmetry as a function of center frequency. The noise level required for threshold is estimated for a tone with frequency fs presented at 15 dB SL in nine symmetric or asymmetric notched noises with notch edge frequencies between 0.6 and 1.4 fs. Using only narrow to medium notch widths provides good information about the tip of the auditory filter, which is of most importance in determining across-channel masking for speech-like signals (but the tail is not well defined). The nine thresholds for a given fs can be used to fit an auditory filter model with three parameters: the slopes of the lower and upper sides (pl, pu) and K. In practice, these model parameters are estimated as a continuous function of fs, and fs is varied across trials over the range 0.5-4 kHz. The stimulus parameters on a given trial (fs, notch condition, noise level) are chosen to maximally reduce the uncertainty in the model parameters, exploiting the covariance between thresholds for adjacent values of fs.
Six subjects have been tested so far. The whole procedure took about 45 minutes per ear. The lower slopes typically corresponded with values expected from the audiogram and a cochlear hearing loss. The upper slopes were steeper in some cases, although not necessarily across the whole frequency range.
Reference
Shen, Y., and Richards, V. M. (2013). "Bayesian adaptive estimation of the auditory filter," J. Acoust. Soc. Am. 134, 1134-1145.EPSR
Stochastic expectation propagation
Expectation propagation (EP) is a deterministic approximation algorithm that
is often used to perform approximate Bayesian parameter learning. EP
approximates the full intractable posterior distribution through a set of local
approximations that are iteratively refined for each datapoint. EP can offer
analytic and computational advantages over other approximations, such as
Variational Inference (VI), and is the method of choice for a number of models.
The local nature of EP appears to make it an ideal candidate for performing
Bayesian learning on large models in large-scale dataset settings. However, EP
has a crucial limitation in this context: the number of approximating factors
needs to increase with the number of data-points, N, which often entails a
prohibitively large memory overhead. This paper presents an extension to EP,
called stochastic expectation propagation (SEP), that maintains a global
posterior approximation (like VI) but updates it in a local way (like EP).
Experiments on a number of canonical learning problems using synthetic and
real-world datasets indicate that SEP performs almost as well as full EP, but
reduces the memory consumption by a factor of . SEP is therefore ideally
suited to performing approximate Bayesian learning in the large model, large
dataset setting
A generative model for natural sounds based on latent force modelling
Generative models based on subband amplitude envelopes of natural sounds have resulted in convincing synthesis, showing subband amplitude modulation to be a crucial component of auditory perception. Probabilistic latent variable analysis can be particularly insightful, but existing approaches don’t incorporate prior knowledge about the physical behaviour of amplitude envelopes, such as exponential decay or feedback. We use latent force modelling, a probabilistic learning paradigm that encodes physical knowledge into Gaussian process regression, to model correlation across spectral subband envelopes. We augment the standard latent force model approach by explicitly modelling dependencies across multiple time steps. Incorporating this prior knowledge strengthens the interpretation of the latent functions as the source that generated the signal. We examine this interpretation via an experiment showing that sounds generated by sampling from our probabilistic model are perceived to be more realistic than those generated by comparative models based on nonnegative matrix factorisation, even in cases where our model is outperformed from a reconstruction error perspective
Bursty bulk flow turbulence as a source of energetic particles to the outer radiation belt
We report observations of a Bursty Bulk Flow (BBF) penetrating close to the outer edge of the radiation belt. The turbulent BBF braking region is characterized by ion velocity fluctuations, magnetic field (B) variations, and intense electric fields (E). In this event, energetic (>100 keV) electron and ion fluxes are appreciably enhanced. Importantly, fluctuations in energetic electrons and ions suggest local energization. Using correlation distances and other observed characteristics of turbulent E, test-particle simulations support local energization by E that favors higher-energy electrons and leads to an enhanced energetic shoulder and tail in the electron distributions. The energetic shoulder and tail could be amplified to MeV energies by adiabatic transport into the radiation belt where |B| is higher. This analysis suggests that turbulence generated by BBFs can, in part, supply energetic particles to the outer radiation belt and that turbulence can be a significant contributor to particle acceleration
- …